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BACKGROUND OF THE INVENTION 
The proliferation of cheap sensors and increased processing power have made 
the acquisition and processing of video information more readily and economically 
feasible. Real-time video analysis tasks such as object detection and tracking can 
increasmgly be performed efficiently on standard PC's for a variety of applications 
such as- industnal automation, transportation, automotive, security and surveillance, 
and communications. The use of stationary cameras is fairly common in a number of 
applications. 

Background modeling and subtraction is a core component in motion analysis. 
A central idea behind such a module is to create a probabilistic representation of the 
static scene that is compared with the current input to perform subtraction. Such an 
approach is efficient when the scene to be modeled refers to a static structure with 
limited perturbation. 

Background subtraction is a core component in many surveillance applications 
where the objective is to separate the foreground from the static parts of the scene. 
The information provided by such a module can be considered as a valuable low-level 
visual cue to perform high-level tasks of object analysis, like object detection, 
tracking, classification and event analysis. See, for example, Remagnino, P., G. 
Jones N. Paragios, and C. Regazzoni: Video-Based Surveillance Systems: Computer 
Vision and Distributed Processing, Kluwer Academic Publishers, 2001; Mittal, A. 
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203, 2003; Grimson,W., C Stauffer, R. Romano, and L. Lee: 1998, 'Using adaptive 
tracking to classify and monitor activities in a site', IEEE International Conference 
on Computer Vision and Pattern Recognition. Santa Barbara, CA, 1998; Ivanov, Y. 
and A. Bobick, 'Recognition of Multi-Agent Interaction in Video Surveillance', 

5 IEEE International Conference on Computer Vision. Kerkyra, Greece, pp. 169-176, 
1999; Cohen, I. and G. Medioni, 'Detecting and Tracking Moving Objects in Video 
Surveillance', IEEE International Conference on Computer Vision and Pattern 
Recognition. Ft. Collins, CO, pp. II: 319-325; Boult, T., R. Mecheals, X. Gao, and M. 
Eckmann, 'Into the woods: visual surveillance of non-cooperative and camouflaged 

10 targets in complex outdoor settings', Proceedings of the IEEE pp. 1382-1402, 2001; 
Stauffer, C. and W. Grimson, 'Learning Patterns of Activity Using Real-Time 
Tracking', IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 
747-757, 2000; and Collins, R., A. Lipton, H. Fujiyoshi, and T. Kanade: 2001, 
'Algorithms for Cooperative Multi-Sensor Surveillance', Proceedings of the IEEE 

15 89(10), 1456-1477,2001). 

A basis for the development of the subspace method for scene 
prediction is found in the work of Soatto et. al. (Soatto et al., Soatto, S., G. Doretto, 
and Y. Wu: 2001, 'Dynamic Textures', IEEE International Conference on 
ComputerVision. Vancouver, Canada, pp. II: 439-446, 2001, and in the publication 
20 by Doretto, G., A. Chiuso, Y. Wu, and S. Soatto: 2003, 'Dynamic Textures'. 
InternationalJournal of ComputerVision 51(2), 91-109, 2003, including an 
implementation of their algorithm and an implementation of Incremental PC A due to 
Silviu Minut. 

To this end, one has to obtain a representation of the background, update this 
25 representation over time and compare it with the actual input to determine areas of 
discrepancy. 

Such methods have to be adaptive and able to deal with gradual changes of the 
illumination and scene conditions. Methods for background modeling may be 
classified into two types: predictive and statistical. 

30 Existing methods in the literature can effectively describe scenes that have a 

smooth behavior and limited variation. Consequently, they are able to cope with 
gradually evolving scenes. However, their performance deteriorates (Figure 2) when 
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the scene to be described is dynamic and exhibits non-stationary properties in time. 
Examples of such scenes are shown in Figures 1 and 10 and include ocean waves, 
waving trees, rain, moving clouds, etc. 



5 BRIEF SUMMARY OF THE INVENTION 

It is herein recognized that, in accordance with the first class of methods, 
predictive methods attempt to model the scene as a time series and develop a 
dynamical model to predict the current input based on past observations. The 
magnitude of the deviation between the predicted and actual observation can then be 
10 used as a measure of change. 

A second class of methods herein recognized and herein referred to as 
statistical methods, neglect the order of the input observations and attempt to build a 
probabilistic representation, that is, a probability density function (p.d.f.) of 
observations at a particular pixel. A new observation can then be classified as 
15 background or foreground based on the probability of this observation belonging to 
the background. 

It is an object of the present invention to provide background modeling 
techniques of extended scope to include scenes that exhibit a consistent pattern of 
change of the observation space in the spatio-temporal domain. See, for example, 
20 Schodl, A., R. Szeliski, D. Salesin, and I. Essa: 2000, 'Video Textures', Proceedings 
of A CM SIGGRAPH Conference. New Orleans, LA, 2000; and Murase, H. and S. 
Nayar, 'Visual Learning And Recognition Of 3-D Objects From Appearance'. IJCV 
14(1), 5-24, 1995. 

It is a further object of the present invention to address the problem of 
25 modeling dynamic scenes where the assumption of a static background is not valid. 
Waving trees, beaches, escalators, natural scenes with rain or snow are examples. 

In accordance with an aspect of the present invention, methods are disclosed 
for the modeling of such scenes. A first method utilizes optical flow for capturing and 
modeling the dynamics of the scene. The uncertainties in the measurements are 
30 evaluated and utilized in order to develop a robust representation of the scene in a 
higher dimensional space. A second method, develops a dynamical model of the 
scene that utilizes multiple past frames to predict the next frame. Incremental methods 
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for updating the model are developed. This second method, follows on work 
proposed by Doretto et al. (See the above-cited publication by Doretto et al. , 2003). 

In accordance with another aspect of the present invention, a measure is 
introduced that is based on a state-driven comparison between the prediction and the 

5 actual observation. 

In accordance with an aspect of the invention, techniques herein disclosed for 
background modeling of dynamic scenes include a method comprising a statistical 
technique and a method comprising a predictive technique. 

In accordance with an aspect of the invention, the statistical method utilizes 

10 optical flow for capturing and modeling the dynamics of the scene. Along with 

optical flow, the intensity at a pixel is considered in an illumination-invariant space. 
Such transformation, as well as optical flow computation, leads to heteroscedastic 
(point-dependent) noise in the data. These uncertainties in the measurements are 
evaluated and utilized to develop a robust representation of the scene in a higher 

15 dimensional space. Such representation can be built efficiently in a nonparametric 
manner within a window of past observations. A new observation can then be 
compared with this representation in order to detect changes. 

In accordance with an embodiment of the invention, a method for dynamic 
scene modeling and change detection applicable to motion analysis utilizes optical 

20 flow for capturing and modeling the dynamics of the scene. Uncertainties in the 

measurements are evaluated and utilized in order to develop a robust representation of 
the scene in a higher dimensional space. In another embodiment, a dynamical model 
of the scene is developed that utilizes multiple past frames to predict the next frame. 
Incremental methods for updating the model are developed and, towards detection of 

25 events, a new measure is introduced that is based on a state-driven comparison 
between the prediction and the actual observation. 

In accordance with another aspect of the invention, a method for scene 
modeling and change detection in an image, comprises the steps of: computing optical 
flow for the image; performing an invariant transformation such that image pixel 
30 intensity is transformed and evaluated in an illumination-invariant space; forming a 
background model in a high-dimensional space; utilizing results of the computing 
optical flow and of the invariant transformation as features in the background model; 
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utilizing the background model to estimate probability for a current input to belong to 
the background; providing a first and a second indication whenever the probability is 
respectively above and below a given threshold; adding the current input to the image 
background model whenever the probability is above the threshold; adding the current 
5 input to the image background model with a low probability whenever the probability 
is below the threshold; and performing morphological operations on the pixel-level 
detection for outputting detection. 

In accordance with another aspect of the invention, the background model is 
developed in a high-dimensional space using kernel density estimation. 

10 In accordance with another aspect of the invention, a method for dynamic 

scene modeling and change detection applicable to motion analysis, comprises the 
steps of: making measurements of optical flow and intensities for the dynamic scene; 
utilizing the optical flow measurements for capturing and modeling the dynamics of 
the scene; evaluating uncertainties in the measurements; and combining the optical 

15 flow measurements, the intensities, and the uncertainties so as to develop a robust 
representation of the scene in a higher dimensional space. 

In accordance with another aspect of the invention, the step of evaluating 
uncertainties in the measurements comprises the steps of: applying an optical flow 
constraint equation at a given point defined by a spatial location and time to obtain 

20 respective constraints; applying an error function to combine the respective 
constraints from each the given point within a defined region for deriving a 
characteristic function; deriving a motion estimate from the characteristic function; 
and comparing the motion estimate with a given uncertainty model so as to derive a 
figure of uncertainty for optical flow measurement data. 

25 In accordance with another aspect of the invention, method for dynamic scene 

modeling and change detection applicable to motion analysis, comprises the steps of: 
deriving optical flow measurement data for the dynamic scene; deriving intensity data 
for the dynamic scene; deriving flow covariance matrices for the optical flow 
measurement data; deriving intensity covariance matrices for the intensity data; 

30 applying relevance criteria to the flow and the intensity covariance matrices for 

selecting relevant features of a subset thereof; and combining the relevant features of 
the subset for performing detection. 
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In accordance with another aspect of the invention, a method for dynamic 
scene modeling and change detection applicable to motion analysis, comprising the 
steps of: making measurements of intensity; deriving measurements of optical flow 
for the dynamic scene; building a probability distribution of the intensity and the 
optical flow in a joint 5-dimensional space comprising 3 color components and 2 flow 
components; utilizing the optical flow measurements for capturing and modeling the 
dynamics of the scene; and combining the optical flow measurements, the intensities 
and the probability distribution so as to develop a robust representation of the scene in 
a higher dimensional space. 

In accordance with another aspect of the invention, apparatus for scene 
modeling and change detection in an image, compres: apparatus for computing optical 
flow for said image; apparatus for performing an invariant transformation such that 
image pixel intensity is transformed and evaluated in an illumination-invariant space; 
apparatus for forming a background model in a high-dimensional space; apparatus for 
utilizing results of said computing optical flow and of said invariant transformation as 
features in said background model; apparatus for utilizing said background model to 
estimate probability for a current input to belong to the background; apparatus for 
providing a first and a second indication whenever said probability is respectively 
above and below a given threshold; apparatus for adding said current input to said 
image background model whenever said probability is above said threshold; apparatus 
for adding said current input to said image background model with a low probability 
whenever said probability is below said threshold; and apparatus for performing 
morphological operations on said pixel-level detection for outputting detection. 

In accordance with another aspect of the invention, apparatus for dynamic 
scene modeling and change detection applicable to motion analysis, comprises: 
apparatus for inputting an image of the scene, including previously stored frames 
thereof; apparatus for dividing the image into blocks, the blocks being represented as 
respective block vectors; apparatus for forming a current state vector of values 
derived by forming the dot product of respective ones of the block vectors with the 
basis vectors; apparatus for deriving an auto-regressive model using state vectors 
observed for the previously stored frames; and apparatus for testing whether the 
current state vector can be projected onto the basis vectors and for determining if the 
current state vector cannot be projected onto the basis vectors, then indicating that a 



7 



10 



15 



20 



new object is present that is moving differently from its background. 

The predictive method treats the image as a time series and develops a 
predictive model to capture the most important variation based on a sub-space 
analysis of the signal, work; for background information on this approach, see, for 
example, Doretto, G., A. Chiuso, Y. Wu, and S. Soatto: 2003, 'Dynamic Textures'. 
International Journal oj Compulsion 51(2), 91-109, 2003. In accordance with 
an aspect of the invention, the components of this model are used in an autoregress.ve 
form to predict the frame to be observed. Differences in the state space between the 
prediction and the observation quantify the amount of change and are considered to 
perform detection. Two different techniques are disclosed for maintaining the model: 
one that update the states in an incremental manner and one that replaces the modes of 
variation using the latest observation map. 

The present description also provides an overview of existing techniques in 
background modeling and presents the method in accordance with the present 
invention for background subtraction using optical flow. Thereafter, a description 
follows of the method in accordance with the present invention for predictive 
modeling of dynamic scenes in a subspace, followed by a review of the comparative 
advantages and disadvantages of the statistical and predictive approaches. 

These and other aspects of the invention in addition to those described in the 
foregoing brief summary of the invention will be explained more fully in the detailed 
description which follows below. 
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BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS 
The invention will be more fully understood from the detailed description of the 
preferred embodiments which follows, in conjunction with the Drawings, in which 
Figure 1 shows frames from an image sequenceof an ocean shore, forthe purpose of 
illustrating features of the present invention; 

Figure 2 shows original images (a) and detection results using (b) a mixture of 
Gaussians model; (c) a non-parametric model; and (d) a non-parametric model with 
low detection threshold; 
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Figure 3 shows a flowchart relating to an embodiment of the present invention; 

Figure 4 shows a flowchart relating to another embodiment of the present invention; 

Figure 5 shows detection results using an optical flow-based method in accordance 

with an embodiment of the present invention; 
5 Figure 6 shows basis vectors in accordance with an embodiment of the present 

invention, as follows: (a) mean, (b,c) mean + constant x (6 principal) basis vectors, 

with insignificant vectors represented by a dark color; 

Figure 7 shows in accordance with an embodiment of the present invention (a) 

number of retained eigenvectors and (b) magnitude of the largest eigenvalue; 
10 Figure 8 shows in accordance with an embodiment of the present invention (a) an 

input signal and (b) a prediction; 

Figure 9 shows in accordance with an embodiment of the present invention (a) input 
frames and (b) detection components, wherein in each block pink represents r u green 
represents r 2 , and white represents detection by combining n and r 2 ; 
15 Figure 10 shows results from a sequence of a road with waving trees, wherein are 
shown: left: input signal, middle: predicted signal, right, block-wise response of the 
detection measures, wherein green represents r, , pink represents r 2 , and white 
represents detection in a block; 

Figure 1 1 shows results from a sequence, wherein are shown: left: input signal, 
20 middle: predicted signal, right: block-wise response of the detection measures; 

Figure 12 shows results from another sequence, wherein in the later stages, a person is 
detected in spite of hiding behind a fence; and 

Figure 13 shows a receiver-operator characteristic (ROC) curves for a sequence for 
(i) mixture-of-Gaussians model, (ii) non-parametric kernels, and (iii) an embodiment 
25 in accordance with the present invention; 

Figure 14 shows a receiver-operator characteristic (ROC) curves for another sequence 
for (i) mixture-of-Gaussians model, (ii) non-parametric kernels, and (iii) an 
embodiment in accordance with the present invention. 
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DETAILED DESCRIPTION OF THE INVENTION 
In the description of the preferred embodiments of the present invention, 
existing techniques are first reviewed. 

Various methods have been proposed in the literature for developing a pixel- 
wise statistical representation of the scene of interest. A simple model keeps a single 
background image. See, for example, Horprasert, T„ D. Harwood, and L. Davis, 'A 
Statistical Approach for Real-time Robust Background Subtraction and Shadow 
Detection', Frame-Rate99, 1999; Ohta, N, 'A Statistical Approach to Background 
Subtraction for Surveillance Systems', IEEE International Conference on Computer 
Vision, Vancouver, Canada, pp. II: 481-186, 2001; and Greiffenhagen, M., V. 
Ramesh, D. Comaniciu, and H. Niemann, 'Statistical Modeling and Performance 
Characterization of a Real-Time Dual Camera Surveillance System', IEEE 
International Conference on Computer Vision and Pattern Recognition. Hilton Head, 
SC, pp. 11:335-342, 2000. The afore-mentioned discuss methods to perform 
illumination invariant change detection using such background representation. In line 
with this direction, a more advanced technique is to keep a running average of the 
intensity, which would be a computationally efficient approach to provide a rough 
description of the static scene in the absence of moving objects. Variations of this 
method include: taking the median of the observed values; calculating spatially 
weighted values in order to reduce the effect of outliers; and keeping the maximum, 
minimum and largest consecutive values. See, for example, Haritaoglu, I., D. 
Harwood, and L. Davis, <W4: Real-Time Surveillance of People and Their Activities', 
IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 809-830, 
2000. Such methods do not explicitly model background versus foreground, and 
therefore they may not be suitable for scenes where many moving objects are present 
and acquisition of a background image is not easy. 

Variations of the visual properties of image observation that correspond to a 
static scene may be reasonably modeled with a Normal distribution. Under the 
assumption of zero mean Normal distribution N(0, 1) for the noise in the 
observation, the pixel intensity may be modeled with a Normal distribution tf(u, I). In 
the publication by Wren, C, A. Azarbayejani, T. Darrell, and A. Pentland, 'Pfinder: 
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Real-Time Tracking of the Human Body', IEEE Transactions on Pattern Analysis 
and Machine Intelligence 19(7), 780-785, 1997), a single Gaussian is considered to 
model the statistical distribution of a background pixel. Output from their 
foreground-background classification algorithm can then be used to determine which 

5 pixels belong to the background. Such decisions are then used to update the mean u 
and covariance matrix I of the background Gaussian incrementally. 

In the publication by Friedman, N. and S. Russell, 'Image Segmentation in 
Video Sequences: A Probabilistic Approach', Proc. of the Thirteenth Conference on 
Uncertainty in Artificial Intelligence^ Al)> 1997, a mixture of three Normal 

10 distributions is utilized to model the visual properties in traffic surveillance 

applications. Three hypotheses are considered: road; shadow; and vehicles. The EM 
algorithm is an optimum method for simultaneously recovering both the parameters of 
the individual models and the classification of the data into different groups, since this 
is a "chicken-and-egg" type of problem. However, due to the computational 

15 complexity of the algorithm and real-time update requirements of the application, an 
incremental EM algorithm is considered to learn and update the model parameters. 
The background (i.e. road), however, is still modeled by a single Gaussian in this 



case. 



In the publication by Stauffer, C. and W. Grimson: 'Adaptive background 
20 mixture models for real-time tracking', IEEE International Conference on Computer 
Vision and Pattern Recognition. Ft. Collins, CO, p. II: 252, 2000, this idea is extended 
by using multiple Gaussians to model the scene. Such an approach is capable of 
dealing with multiple hypotheses for the background and can be useful in scenes such 
as waving trees, beaches, escalators, rain or snow. In order to improve the efficiency 
25 of the method, as opposed to EM, the authors propose a simple exponential update 
scheme for the mixture model. The probability that a certain pixel has intensity x t at 
time / is estimated as: 

where Wj the weight, » the mean and ^ =(JjI the covariance matrix for theyth 
30 Gaussian are the parameters to be learnt. The k distributions are ordered based on 
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W J 1 °J and the first B distributions are used as the model for the background. Here, B 
is estimated as: 



B = arg min 



Vf i 

>T 



The threshold T is the fraction of the total weight given to the background model. A 
pixel is assigned to a Gaussian if it is within a distance of 2.51 from its mean. Using 
this classification, the weights are updated as follows: 

where a is the learning rate and M k ,, is 1 for the model which matched and 0 for the 
remaining models. The model parameters for the matched distribution are updated as 



follows: 



where p = ap(X,|^ kl o K ). In the publication by Mittal, A. and D. Huttenlocher: 
'Scene Modeling for Wide Area Surveillance and Image Synthesis', IEEE 
International Conference on Computer Vision and Pattern Recognition. Hilton Head, 
SC, pp. II: 160-167, 2000, a modification of the above scheme is proposed by the use 
of constant weighting along with exponential weighting and specify methods to 
determine the scheme to be used at a particular time. See also Paragios, N. and V. 
Ramesh, 'A MRF-based Approach for Real-Time Subway Monitoring', IEEE 
International Conference on Computer Vision and Pattern Recognition. Hawaii, pp. 
1:1034-1040, 2001. 

The mixture-of-Gaussians method is quite popular and was to be the basis for 
a large number of related techniques. See Greiffenhagen, M., V. Ramesh, and H. 
Niemann, 'The Systematic Design and Analysis Cycle of a Vision System: A Case 
Study in Video Surveillance', IEEE International Conference on Computer Vision 
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and Pattern Recognition. Hawaii, pp. 11:704-71 1, 2001; Mittal, A. and D. 
Huttenlocher: 2000, 'Scene Modeling for Wide Area Surveillance and Image 
Synthesis', IEEE International Conference on Computer Vision and Pattern 
Recognition. Hilton Head, SC, pp. II: 160- 167, 2000; Wang, D., T. Feng, H. Shum, 
and S. Ma: 'A Novel Probability Model for Background Maintenance and 
Subtraction', International Conference on Vision Interface, p. 109, 2002; Javed, O., 
K. Shafique, and M. Shah: 'A hierarchical approach to robust background subtraction 
using color and gradient information', IEEE Workshop on Motion and Video 
Computing. Orlando, Florida, pp. 22-27, 2002; and Harville, M.: 'A Framework for 
High-Level Feedback to Adaptive, Per-Pixel, Mixture-of-Gaussian Background 
Models', European Conference on Computer Vision. Copenhagen, Denmark, p. Ill: 
543 ff, 2002. In the publication by Gao, X., T. Boult, F. Coetzee, and V. Ramesh: 
'Error Analysis of Background Adaption', IEEE International Conference on 
Computer Vision and Pattern Recognition. Hilton Head Island, SC, pp. I: 503-510, 
2000, a statistical characterization of the error associated with this algorithm is 
disclosed. 

Parametric methods are a reasonable compromise between low complexity 
and a fair approximation of the visual scene when it obeys the general assumptions 
imposed by the selected model. When these assumptions fail, non-parametric 
approaches are more suitable. A popular non-parametric approach is to use kernels. 
In this method, a kernel is created around each of the previous samples and the 
density is estimated using an average over all the kernels: 

While different kernels can be considered, the Normal kernel was the one that was 
proposed in the publication by Elgammal, A., D. Harwood, and L. Davis: 'Non- 
parametric Model for Background Subtraction', European Conference on Computer 
Vision. Dublin, Ireland, pp. 11:751-767, 2000 to model the intensity at a particular 
pixel: 
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There follows first a descriptive overview of a method in accordance with the 
invention for background modeling that is able to account for dynamic backgrounds 
that change according to a certain pattern. 

Figure 3 shows a flowchart relating to an embodiment utilizing a non-predictive 
5 density-based method in accordance with an aspect of the invention. There follows 
first a description of a non-predictive density-based method in accordance with an 
aspect of the invention that neglects the order in which the observations occur. This is 
a valid assumption when the scene is static or almost static and wherein the changes 
in the appearance of the background are extremely gradual. In scenes where there is a 
,0 more drastic change in the background, however, this assumption does not hold. A 
possible solution or fix to this limitation is to capture the dynamics of the scene using 
inter-frame motion characteristics. In this section, a method is disclosed based on this 
idea, in accordance with an aspect of the present invention. Although temporal 
relationships beyond two frames are still lost, very promising results have been 
15 obtained using this technique in accordance with an aspect of the present invention. 
More complex predictive methods that are able to capture longer term temporal 
relationships will explained in the next section. 

Next, optical flow-based background modeling using variable bandwidth is 
considered, in accordance with the principles of the invention. Considering first the 
20 question of density estimation: 

A possible mechanism to model and account for the motion characteristics of 
the scene is through the evaluation of optical flow. Such flow vectors can be 
considered to define a multivariate space, along with intensity, and density estimation 
can be performed in this higher dimensional space for the purpose of object detection. 

Several issues need to be addressed in order to obtain robust object detection 
in this higher-dimensional space. Optical flow estimation can have significant and, 
possibly directional, errors associated therewith. The one-dimensionality of the 
spatial image structure causes the so-called aperture problem. The expression refers 
to the fact that the motion of a moving one-dimensional pattern viewed through a 
circular aperture is ambiguous. In locations where the spatial gradient vanishes, there 
is no constraint on the velocity vector. This is often called the blank wall problem. In 
conjunction with the errors in measuring the temporal derivative and in the optical 
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flow equation (see Verri, A. and T. Poggio: 'Motion Field and Optical Flow: 
Qualitative Properties'. IEEE Transactions on Pattern Analysis and Machine 
Intelligence 11(5), 490-498, 1989), this leads to heteroscedastic noise where the error 
in the measurement is point-dependent. 

5 Furthermore, the necessity of considering the visual properties in an invariant 

space (such as the normalized color space) introduces heteroscedastic noise in the 
transformed variables. See the above-cited publication by Greiffenhagen et al., 2000. 
Therefore, the multivariate density in the 5D space (3 for color and 2 for optical flow) 
must be estimated after properly accounting for and using the uncertainties associated 

10 with the measurements. 

In accordance with an aspect of the present invention, a novel kernel-based 
method will next be described for estimating the multivariate probability density that 
takes into account the errors in both the sample observations and the estimation (test) 
observation. In such modeling, it is assumed that the uncertainties in the 
15 measurements are available. Thereafter, methods to recover such uncertainties will be 
presented. To this end, methods are herein disclosed for computation of optical flow 
that make use of and yield the inherent uncertainty in the optical flow computation, 
methods for estimating the uncertainties in the invariant transformations of the 
intensity (RGB) and certain assumptions that allow combining the two. 

20 Next, variable bandwidth kernel density estimation is considered. Assuming 

that flow measurements and their uncertainties are available, a theoretical framework 
is herein provided for obtaining an estimate of the probability distribution of the 
observed data in a higher-dimensional space which, in the present case, is 5 — two 
components for the flow and three components for the intensity in the normalized 

25 color space. 

Several methods - parametric and non-parametric - can be considered for 
constructing this probability distribution. A mixture of multi-variate Gaussians could 
be considered to approximate this distribution. The parameters of the model, i.e. the 
mean and the covariance matrix of the Gaussians, can be estimated and updated in a 
30 manner similar to that disclosed in Stauffer, C. and W. Grimson: 'Adaptive 

background mixture models for real-time tracking', IEEE International Conference 
on Computer Vision and Pattern Recognition. Ft. Collins, CO, p. II: 252, 1999. Care 
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has to be exercised, however, in dealing with the uncertainties in a correct manner. 

A more suitable approach refers to a non-parametric method that has the 
characteristic of being able to deal with the uncertainties in an accurate manner. The 
disadvantage of the method is its high computational expense. With the rapid 
increase in the computational power of processors, an exemplary embodiment of this 
method in accordance with the present invention is already running in quasi real-time 
on a 160 x 120 3-band video on a Pentium IV 2.2GHz processor machine. 

The most common and popular method used in the statistics literature for 
modeling multivariate probability distributions from sample points is the to/-based 
density estimation. See Scott, D.: Multivariate Density Estimation, Wiley- 
Interscience, 1992; and Fukunaga, K, Introduction to Statistical Pattern Recognition. 
Academic Press, Boston, second edition, 1990. In particular, such selection is quite 
appropriate when the sample points have variable uncertainties associated with them. 
Then, the kernel-based approach yields a powerful method for integrating the 
1 5 information available from the samples. 

Let x h x 2 , x„ be a set of d dimensional points in R d and H be a symmetric 
positive definite dxd matrix (called the bandwidth matrix). Let K : R d -> R 1 be a 
kernel satisfying certain conditions that will be defined later. 

Then the multivariate fixed bandwidth kernel estimator is defined as (see the 
20 above-cited publication by Scott, 1992; and see Wand, M. and M. Jones: Kernel 
Smoothing, Chapman and Hall, 1995): 



(1) 



25 where K H (x) - H-' /2 K(H' 1/2 x). The matrix H is the smoothness parameter and 
specifies the "width" of the kernel around each sample point x. 

A well-behaved kernel K must satisfy the following conditions: 

[ d K{w)dw=\, [wK{w)dw=Q, [ww T K(w)dw=I d 
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The first equation accounts for the fact that the sum of the kernel function over 
the whole region is unity. The second equation imposes the constraint that the means 
of the marginal kernels {K^.i = 1....4} are all zero. The third equation states that 
the marginal kernels are all pair-wise uncorrected and that each has unit variance. 

It is possible to utilize such an approach for background modeling and for 
clarity purposes, and the outline of a method that may be employed is herein 
provided. First, the optical flow is estimated (see Lucas, B. and T. Kanade: 'An 
Iterative Image Registration Technique with an Application to Stereo Vision', 
DARPA81, pp. 121-130, 1981; and Horn, B. and E. Weldon-Jr.: 'Direct Methods 
for Recovering Motion'. International Journal of ComputerVision 2(1), 5 1-76, 1988) 
wherein are disclosed some popular methods) and the intensity is transformed into an 
invariant normalized color space comprising three dimensions - normalized r = R/(R 
+ G + B) and g = G/(R + G + B), and the average intensity I = (R + G + B)/3. Then, 
one can build a 5D feature vector x comprising 2 features for the optical flow and 3 
features for the intensity in the normalized space. Neglecting the uncertainties in the 
optical flow and the transformation, the bandwidth matrix H may be specified 
manually such that the different features are weighted appropriately. For example, the 
intensity I should be weighted less for robustness to changes in illumination. The 
probability density can then be estimated from the n previous observations using 
Equation 1. 

Although such a fixed bandwidth approach can be used, use of variable 
bandwidth usually leads to an improvement in the accuracy of the estimated density. 
The reason is that a smaller bandwidth is more appropriate in regions of high density 
since a larger number of samples enables a more accurate estimation of the density in 
these regions. On the other hand, a larger bandwidth is more appropriate in low 
density areas where few sample points are available. 

It is possible to consider a bandwidth function that adapts to not only the point 
of estimation, but also the observed data points and the shape of the underlying 
density. In the literature, two simplified versions have been studied. The first varies 
the bandwidth at each estimation point and is referred to as the balloon estimator. The 
second varies the bandwidth for each data point and is referred to as the sample-point 
estimator. Thus, for the balloon estimator, 
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ruf^ l YfC (x-x))=-t { ——K(H(xr U2 (x-x t )) 

where H(x) is the smoothing matrix for the estimation point x. For each point at 
which the density is to be estimated, kernels of the same size and orientation are 
centered at each data point and the density estimate is computed by taking the average 
of the heights of the kernels at the estimation point. A popular choice for the 
bandwidth function in this case is to restrict the kernel to be spherically symmetnc. 
Thus only one independent smoothing parameter remains h k (x) which is typically 
estimated as the distance from x to the fth nearest data point. This estimator suffers 
from several disadvantages - discontinuities, bias problems and integration to infinity. 

An alternate strategy is to have the bandwidth matrix be a function of the 
sample points. This is the sample-point estimator (see the above cited reference to 
Wand and Jones, 1995; Comaniciu, D, 2003a, 'An Algorithm for Data-Dnven 
Bandwidth Selection', IEEE Transactions on Pattern Analysis and Machine 
lntelli g ence 25(2), 2003a; Comaniciu, D., V. Ramesh, and P. Meer: 'The Variable 
Bandwidth Mean Shift and Data-Driven Scale Selection', IEEE International 
Conference on Computer Vision. Vancouver, Canada, pp. I: 438^45, 2001): 

rt^ l tK (x-x))=-f l —-j T K(H(x i T V2 (x-x l )) 



The sample-point estimator still places a kernel at each data point, but these 
kernels each have their own size and orientation regardless of where the density is to 
be estimated. This type of estimator was introduced by Breiman, L.,W. Meisel, and E. 
Purcell: 'Variable Kernel Estimates of Multivariate Densities'. Technometncs 19, 
135_144, 1977, who suggested using 
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H(x < )=A(x < )/ 
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where h( Xi ) is the distance from * to the Mi nearest data point. Asymptotically, tms 
is equivalent to choosing ftfc> -/^ where d is the dimension of the data. A 
popular cho.ce for the bandwidth function, suggested by Abramson (see Abramson, 
I • 'On Bandwidth Variation in Kernel Estimates- A Square Root Law', The Annals 
of Statistics 10, 1217-1223, 1982) is to use ft» -JWJ^ and, in practice, to use a 
pilot estimate of the density to calibrate the bandwidth function. 

A hybrid density estimator is herein introduced, wherein the bandwidth is a 
function not only of the sample point but also of the estimation point x. The 
particular property of the data that will be addressed is the availably of the 
uncertainty estimates of not only the sample points, but also the estimation port x. 
Let {x.}" ,be a set of measurements in d-dimensional space such that each * has 
associated with it a mean Ui (in and adxd covariance matrix "Li. Also, let x (with 
m ean p x and covariance « be the current measurement whose probabmty is to be 
estimated. We define the multivariate hybrid density estimator as: 
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where the bandwidth matrix H(x; Xi ) is a function of both the estimation measurement 
and ^admeasurement. Chen, H. and P. Meer: 'Robust Computer V 1S1 on 
through Kernel Density Estimation'. European Conference on Computer Vmon, 
Copenhagen, Denmark, pp. I: 236-250, 2002 have suggested using H, for 
Epanechnikov kernels for the case where no error in the estimation measurement is 
available. Expanding on this idea, we propose the use of (x,*,)-S, + Z, as a 
possible bandwidth matrix for the Normal kernel. Thus, the density estimator 
becomes : 
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Given me samp,e measurements, .his estimator is a taction of both the mean 
and me eovariance of the estimation measnrement. Thus, it is no. a den* nanetron 
m ,he baditional sense wh.eh is a fnne.ion on,y of .he estimation pom, ,n * 
di mens,ona, space. However, if the e„var,anoema ttl x & is aeptfrxed,,, becomes a 

proper density function. 

^particular choice for the bandwidth Action has a strong mathen^tical 

rneans^andcovariancematricesW.^x^N^X,,. 1.2 

th at ifx,andx 2 are dependent, the distribution of (x, -x 2 ) ^( + 

Thus, the probability that x, - x 2 = 0 or x, - x 2 is 

, ___J (-Jon ~ *nf (=1 + <" " ^ 2) ) 

Thns EquationScanbemongh.ofaa.heaverageof.heprobabilities.ha.me 

!ll-«^•.*^'-^- tai - 4,, " ,,,,, " 

sample measurements. 

Ue choree for the ba„dw,d,h matnx can also be justitied by .he fact .ha, .be 
erections in which rhere is more nnce«ain.y ehher ,n the estimation mea— or 

0 the results obtained using .bis criterion were quite satisfactory and compared 

niyw,mme f ,aedbandwid.he S tima.o r (byne g lec«„ g fheavailab,ecovana„ces 

IZJr, or the b,— e-poin. estimators (by nesting me uncerta,n,y m 
the sample or erfmr.no" measurements, respectively). 

N ex«, measurementoffeaWes and men uncertainties is considered. Towards 
25 te introduction of.be method for estimating the dens,* taction in the presence of 
uncertainties, i, has been assumed ma, the uncertainties in me measutements ar 
:lb 1 e.Here,,hememod S ,ba,m,gb,beusedforob,ai»,ng,hemea—aud 

their associated uncertainties are considered. 

Severn, optica, How algorimms and ,heir ex,ensions can be considered. See 

» S,mo„ce„i,E, »*^ I ^ < ^'^^S 
Vision an, Applications, Ac^mc Press, Vol. 2. pp. 397-^22, ,999, 
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Lucas, B. and T. Kanade: 'An Iterative Image Registration Technique with an 
Application to Stereo Vision', DARPASl.pv 121-130, 1981; Jr.,Hom, B. and E. 
Weldon-Jr.: 'Direct Methods for Recovering Motion'. International Journal of 
ComputerVision 2(1), 51-76, 1988; Comaniciu, D.: 'Nonparametric Information 
Fusion for Motion Estimation', IEEE International, 2003b; Simoncelli, E., E. 
Adelson, and D. Heeger: 'probability distributions of optical flow', IEEE 
International Conference on Computer Vision and Pattern Recognition, Mauii, 
Hawaii, pp. 310-315, 1991; Weber, J. and J. Malik: 'Robust Computation of 
Optical-Flow in a Multiscale Differential Framework' , International Journal of 
Computer Vision 14(1), 67-81, 1995; Anandan, P.: 'A Computational Framework and 
an Algorithm for the Measurement of Visual Motion', International Journal of 
Computer Vision 2(3), 283-3 10, 1989; Szeliski, R.: 'Bayesian Modeling of 
Uncertainty in Low-Level Vision', International Journal of Computer Vision 5(3), 
271-302, 1990; Fleet, D. and K. Langley: , 'Recursive Filters For Optical-Flow', 
IEEE Transactions on Pattern Analysis and Machine Intelligence 17(1), 61-67, 1995; 
and Singh, A. and P. Allen: 'Image-Flow Computation: An Estimation-Theoretic 
Framework and a Unified Perspective', Computer Vision, Graphics and Image 
Processing 56(2), 152-177, 1992. See Barron, J., D. Fleet, and S. Beauchemin: 
'Performance of Optical Flow Techniques'. International Journal of Computer Vision 
12(1), 43-77, 1994 for a survey and performance analysis of existing methods. The 
one that is most suitable for the approach in accordance with principles of the present 
invention and which is described next, has the desired characteristic of being able to 
determine the error characteristics of the optical flow estimate. The method proposed 
by Simoncelli (see the above-cited publication by Simoncelli, 1999) is an extension to 
the method of Lucas and Kanade (see the above-cited publication by Lucas and 
Kanade, 1981). The basic idea of this algorithm is to apply the optical flow constraint 
equation (see the above-cited publication by Horn and Weldon-Jr., 1988): 



v r g./+g,=o 

30 
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*re Vg and „ are ft. spafal image grtadien. and tempera, derivative, respechvely 
of the image a, a given spatial location and rime, and/is rhe rwo-dimensiona, ve.oc.ry 
veeror. Sueh an equation purs only one constrain, on the two paramerers (apertnre 
problem). Thna, a smoothness constram, on .he field of velocity veerors b a common 
selection ,o address .his .imitation. If we assume locaUy consran. velocity and 
combme linear consols over local spa.ia. regions, a S um-of-s q uares error functron 
can be defined: 

E (f) = £w ( [V T iK*i,t)/ + ifc(*i>*)l 2 

i 

* vi^n with resnect to /-yields (see the above-cited publication 
Minimizing this error function with respect 107 yieiuo v 

by Simoncelli, 1999) for more details): 
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where 



M 



= r 8 = lEffxS„ £4 J ! I Eft* 



(4) 



and all the summations are over the patch. 

Simoncelli, (see .he above-cUed publication by Simonceffi, !999) introduces a 
nrode. for .he uncertamties in the following way. Define/as .he optica, flow./as the 
acma, ve.oc.ty field, and „ as the random variable describing the difference berweeu 
20 the two. Then: 

/=/+«. 

Similarly, let g t be the actual temporal derivative, and g t the measured denvative. 
Then: 

A 

g,=g,+"i 

where „ , is a random variable charac.eriz.ng the uncertainty in this measurement 
relative to me hue derivative. The uncertainty in fire spatial derivatives is assumed .o 
be much smaller man me uncertainty in the temporal derivatives. 

Under the assumption ma. and « are normally dis.ribu.ed w„h covariance 
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maTices Ai - U and *-». (« * ™> the *>" VeC ' OT f haS ' Zer °" mea " , 

No™, prior distribution with covanance Ap, the covanance and mean of the oprioal 

flow vector can be estimated: 
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(5) 
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where w, is a weighttng funerion over fte patch, with the pohus in the patch indexed 
by and M„ and b, are the same as maMces defined in Elation 4 bn. wiehoo. me 
snmmation and evaluated at location x ( . 

In order to handle significant displacements, a multi-scale approach is 
considered «ha, uses .he flow estimates ftom a higher scale to inuiahze the flow for a 
,ower level. Towards the propagation of variance across scales, a Kalman filter ts 
USKl with the normally used rime variable replaced by the scale. Refer to the ortgmal 
paper by Simouce.li, (see .he above-Oed pnb.icafion by Simonce.li, 1999) for flutter 
details. 

Estimation of the Covanance Matrix is next considered. The covariance I, for 
each observation x, (in 5D space) can be derived from the covariances of the 
components -the normalized color and optical flow. Suppose *, G, and S are the 
values observedatthe P1 xel. Let S = R + G + B,r = R/S and g = G/S. Assumingthat 
the sensor noise (in RGB space) is normally distributed with a diagonal covanance 
matrix having diagonal terms a, it was shown (see the above-cited publication by 
Greiffenhagen et «L, 2000) that the uncertainties in the normalized estimates r and g 
can be approximated by: 



25 Assuming that intensity and optical flow features are uncorrected (wluch may not be 
true in general), an expression for S, can be derived: 
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where A, from Equation 5 is used and boidfaee O a represent the appropriate zero 
matrices. 

Experimental results are next considered. In order to validate the proposed technique, 
the challenging scene of an ocean front was considered. Such a scene involves wave 
motion, blowing grass, long-term changes due to tides, global illumination changes, 
shadows, etc. An assessment on the performance of the existing methods (see the 
above-cited publications by Gnmson et al., 1998; and Elgammal et 1. 2000) » 
shown in Figure 2. Even though these techniques were able to cope to some extent 
with the appearance change of the scene, their performance can be considered 
unsatisfactory for video based surveillance systems. 

The detection of events was either associated with a non-acceptable false 
alarm rate or the detection was compromised when focus was given to reducing the 
false alarm rate. The algonthm in accordance with the present invention was able to 
detect events of interest on the land and simulated events on the ocean front wnh 
extremely low false alarm rate as shown in Figures 13 and 14. 

Based on the experiments, it can be said that the approach in accordance with 
the present invention was able to capture the dynamic structure of the ocean front as 
we ,l as the blowing grass. At the same time, dealing with the static parts of the scene 
is trivial with the proposed framework. 

Non-predictive density-based methods neglect the order information in the 
sample observations. Although the use of inter-frame optical flow in a higher 
dimensional space mitigates some of the limitations in this approach, longer term 
temporal relationships are still not captured. In order to overcome this limitation, one 
can consider predictive methods that try to model the observation as a time senes and 
develop a dynamic model for predicting the current observation based on past ones. 
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Predictive Methods are next considered. In this section, prior known 
approaches are surveyed firs,, mostly hased on the linear Kalman filter. Then, a much 
m „re complex method in accordance with the present invention ts developed based on 
subspace analysis of the signal. The method is able to capture .he long term dynamtc 

5 characteristics of the scene, temporal and smrcUtral relationships between Afferent 
pixels and multiple modalities of dynamic behavior. 

Moreover, the method in accordance with .he presen. invention adapts the 
detection parameters according to .he variation presen, in the scene leading to a 
promising .ool for change detection in dynamic scenes that exhibit a consrstent pattern 

10 of change. 

Prior work on predictive methods consists of models of varying complexity. 
Continuous functions of a particular type (e.g. polynomials) may be leant, from pas. 
observations and used to mode, the dynamics of a particular pixel. See me following, 
Rrdder C, O. Munkelr, and H. Kirchner; -Adaptive background estimation and 
,5 foreground detection using Kalman filtering', Proc. MernaUonal Conference on 
recent Advances in Mechanics, 193-199, 1995; Karmann, K.-P, A. von Brandt, 
andR Gerl: Signal Processing V: Theories and Application, Chap,. Moving Object 
Segmentation based on adaptive reference images. Elsevrer, Amsterdam, The 
Netherlands, 1990; Karmann, K.-P. and A. von Brandt; V Cappelllnl (ed). Tlnte 
20 ^,„ gfm <,geProc eSsi »gun«^g0^c, ta o gn i,io„,V„,.2,Chap,.Mov,ng 

Object Recognition Using an Adaptive Background Memory, Elsevier, Amsterdam, 
The Netherlands, 1990; and Koller, D„ I.Weber, and J. Malrk; 'Robust Multiple Car 
Tracking Witt, Occlusion Reasoning', EuropeanConference on Computer V.s,on, 
Stockholm, Sweden, pp. 189-196, 1994; wherein ,. is proposed to use a Ka.man-fil.er 
25 based dynamical model for each pixel. The Kalman filter tries ,o estimate Are state x 

eR" of a discrete-time controlled process tat is governed by the linear stochastic 

difference equation: 

x^Ax^+Bu, 
with a measurement zeR": 

z=Hx, 

Assuming a Gaussian nense model, the parameters of the model can be estimated 
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recursively. It includes a prediction step: 

- Ai^i + But, Pi = AP t „ t A T + Q 

where x; is the predicted state at time t h and P t is the predicted error covariance 
matrix. Q is the process noise covariance matrix. The measurement update step 
includes: 

K t = PfH T (HP-H T + R)~ 1 
x t = &t +K t {zt- H% ) 
P t ^ (I - K t H)P t ~ 

where K, is the Kalman gain matrix and R is the measurement noise covariance 
matrix. The process is recursive: the previous a posteriori estimates are used to 
predict the new a priori estimates and the current measurement is used to correct the 
estimates to obtain a posteriori estimates. 

Koller et al. (See the above cited publication by Roller et al., 1994) use a 
simple state model where the state is a simple ID value corresponding to the 
background intensity. The state (background image) is updated differently depending 
on whether it is hypothesized to be part of the background or not: 

B t+i = Bt + («i(l -M t ) + a 2 M t )D t 



where B t represents the background model at time t and D, is the difference between 
the present frame and the background model. Mt is the binary mask of hypothesized 
moving objects in the current frame: 

M t (x) = | 0 dse 

where T, is the threshold at time t. The gains a, and a 2 are based on an estimate of the 
rate of change of the background and depend on whether the pixel is hypothesized to 
be background or foreground. 

In the publication by Toyama, K., J. Krumm, B. Brumitt, and B. Meyers: 
'Wallflower: Principles and Practice of Background Maintenance', IEEE 
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Mernatona, Conferee on Conner Vis*, Kericyra, Greece, pp. 25 5 -26>, .999, 
simplervers.onof.heKalrnanfil.erisused.cai.ed^/^U.a.opera.es 

Jcfly on ft. da.a rather ,ha„ recursive* capturing the essence of pas. observations 
in a state vector. Mention of snch pixeMevel method with region and frame-level 
information leads to a very promising solution. 

„ is posstble to consider Kalman fillers that nsc more complex state vectors 
„, a Hnc,udea„estima,eo,meve,oci,y or evenhtgher order dynamics of the tntensiry 
variation. These Itnear-ftlter driven methods wonld be able .0 capture the behavor of 
som e S imp.e dynamtc backgrounds. However, even snch ex<enaion* t suffer ftom 
various limttations: (,) restriction of these fthers ,o hnear models and (n) the U of 
abiliry to capbare multiple modaliries and complex relationships between netghbonng 
pixels. 

Snbsnace modeling, prediction and subtraction of dynamic scenes are next 
considered In thts section, a novel approach to modeling dynamic scenes, ,n 
accordance with an aspect of the present invention, tat leams the dynamtcal 
characteristics of the image as a who,e (as opposed to ptxel-based approaches 
presented so far). The pnnciple is ,o learn a productive mode, for the scene ftat . able 
,„ pmdic, me next frame based on the current and pas. observation, Ideas from 
Boretto e. al. (see the above-cited publication by Dore.,0 e, a,., 2003) are adapted .o 
.present the cutren, state of the scene in an appropna.e,y chosen subspace and* 
.earn the dynamical mode, of the scene in the state space. Stnce fte mo*, mus. be 
.earn, onhne and must adapt to the changing background, an tncrementa, method for 
adapting the model parameters tsdeve.oped. Also developed are robns. and 
m eaningfu,n,easures.ha,a,lowthepresen.me,hod to de.ee. if the cutren. utput 
de via,es from .he mode, prediction in a significant way and thus, shou,d be flagged as 



25 

"new 



Firs, an outltne of the method tn accordance win, an embodiment of the 
invention win be desenbed, foUowed by a more detailed disCosnre, including frtrther 
considerations and a a more detailed analysis of process in accordance with vanous 
30 embodiments of the invention. 

Figure 4 shows a flowchart re,a,ing to an embodtment utilizing the principle of 
.earning a predictive mode, for the scene that is ab,e to predict the nex, frame based 
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on the current and past observations. An outline of an algorithm in accordance with 
an aspect of the invention is next disclosed. 

The image is divided into blocks, and for each block, PCA is performed using 
a certain number of previous frames. Regarding PCA, see I.T. Jolhffe: Principal 

5 Component Analysis, Springler-Verlag, 1986. Each block is treated as a vector and 
the dot product of the vector with the basis vectors is found. The set of values forms 
the current state vector (of dimensionality = the number of principal components 
retained). An auto-regressive model is also learnt using the states observed over time 
so that the model gives us the state at the current time instant given the states at some 

,0 previous time instants. For detection, we test whether the current image vector can be 
projected onto the basis vectors. If it cannot be projected, we say that there is a new 
object in the scene. We also compare the current state to the predicted state and if the 
difference is substantial, we say that there is a new object that is moving differently 
from the background. 

15 There follows next a more detailed description of an embodiment in 

accordance with the invention. 

Scene modeling is next considered. Let {I(t)}, =1 .., be a given set of images. (In 
order to achieve insensitivity to changes in illumination, the input may be transformed 
into a normalized space ofS = R + G + B,r = R/S, and g = G/S, and the quantity S 
20 divided by an appropriate constant.) A central idea behind the approach in 

accordance with the present invention is to generate a prediction mechanism that can 
determine the actual frame using the k latest observed images. Such an objective can 
be defined in a more rigorous mathematical formulation as follows: 

W*) = /<!(* " !)' *<* - 2), . . . , I(* - *» 

25 where/, a it-th order function is to be determined. Quite often, information provided 
by the input images is rather complex and cannot be used in an efficient manner for 
prediction. Furthermore, solving the inference problem leads to a high-dimensional 
search space. In that case, complex techniques and a significant amount of samples 
are required in order to recover a meaningful solution. One can address this limitation 

30 by the use of spatial filter operators. Complexity reduction of the search space and 
efficient data representation are the outcome of such a procedure. Let W*=*' be a 
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filter bank and %,(t) = M*))> the output of convolution between the operator^ and the 
image I(t). The outcome of such a convolution process can be combined into a vector 
that represents the current state s(t) of the system. 

s r (() = h(t),...,%Wl 

Wavelet operators, Gabor filters, anisotropic non-linear filters can be considered. 
Within the proposed framework in accordance with a described embodiment of the 
present invention, linear filters are considered. Limited complexity and the existence 
of efficient algorithms are the main motivation for such selection. Moreover, such 
filters are able to capture a significant amount of variations in real scenes. 

Feature space is next considered. Based on the predictive model that was earlier 
introduced, a similar concept can be defined in the state space. Similar notation can be 
considered leading to 

Spred (*) = /(s(*-l) ! s{*-2) 5 >.- ; #~fc)) 



Several techniques can be considered to determine such prediction model. Principal 
component analysis refers to a linear transformation of variables that retains - for a 
given number n of operators - the largest amount of variation within the training 
data. See Arun, K. and S. Kung: 'Balanced Approximations of Stochastic Systems', 
SIAM Journal of Matrix Analysis and Applications 11(1), 42-68, 1990; Jolliffe, L: 
1986, Principal Component Analysis, Springler-Verlag, 1986; de la Torre, F. and M. 
Black: 'Robust Principal Component Analysis for Computer vision', IEEE 
International Conference on Computer Vision, Vancouver, Canada, pp. I: 362-369, 
2001 . It is also referred to as the Karhunen-Loeve Expansion (See M. M. Loeve, 
Probability Theory, K. Van Nostrand, Princeton, 1955.) In a prediction 
mechanism, such a module can retain and recover in an incremental manner the core 
variations of the observed data. 

The estimation of such operators will be addressed in the next section; in order 
to facilitate the introduction of the method at a concept level, one can consider them 
) known, 
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{b, = 

We consider these operators to produce the state vector s(t): 

s(t )=[bTi f b^i ! ... > brn T =B T i 

5 where B = [bl; b2; b3; : : : ; bj denotes the matrix of basis vectors. 

A prediction mechanism is next considered. The next step - given the selected 
feature space - refers to the modeling and estimation of the prediction function/. In 
accordance with an aspect of the present invention , one can consider various forms 
(linear or non-linear) for such prediction mechanism. Non-linear mechanisms 

10 involve higher sophistication and can capture more complicated structures. However, 
the estimation of such functions is computationally expensive and suffers from 
instability. 

It is herein recognized that linear models are a good compromise between low 
cor 
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omplexity and a fairly good approximation of the observed structure. Auto- 
regressive models of a certain order k can approximate and predict the actual 
observation based on the latest k feature vectors. The predicted state will be a linear 
combination of these vectors: 



20 



where A is an n x n prediction matrix. Then, prediction in the image space can be 
computed using the pseudo-inverse of B . 

IpredP) =pse,udoinv(B T ) -Spredf) 

where the pseudo-inverse is defined as 

pWom«(B T ) = (BB T )~ 1 B = B 

25 

31 



10 



since BB T = / (The basis vectors are orthogonal and have unit norm). Thus, the 
unknown variables of the present scene model comprise the basis vectors and the 
auto-regres S1 ve matrix. Visual appearance of indoors/outdoors scenes evolves over 
time Global and local illumination changes, position of the light sources, tidal 
changes, etc. are some examples of such dynannc behavior. One can account for such 
changes by continuously updating both the basis vectors and the predictive model 
according to the changes on the observed scene. Last, but not least, the 
d 1S criminability of the model should be preserved in order to perform accurate 
detection. 

Model construction is next considered. Estimation of the ba S1 s vectors from 
the observed data set can be performed through singular value decomposition. Then 
one can update such estimation as follows: (i) Considering an observation set that 
consists of the last.frames and recomputing the SVD based on the new data, (n) 
Performing an incremental update of the basis vectors with exponential forgettmg 
where every new frame is used to revise the estimate of these vectors. Similar 
procedures can be considered when recovering the parameters of the auto-regressive 
model. 

The Estimation of Basis Vectors is next considered. Considering Batch PCA: 
Let {I(t)h-, . be a column vector representation of the previous m observations. We 

> assumethatthedim^^ a 
m ean assumption can be considered for the ^/by estimating the mean vector a and 
subtracting from the training samples. Given the set of training examples and the 
mean vector, one can define the d x d covariance matrix: 

S I = JE{I{e)I T (*)} 

It is well known that the principal orthogonal directions of maximum variation 
for I(t) are the eigenvectors of I, . See the above-cited publication by Jolliffe, 1986. 
Therefore, one can assume that the use of such vectors is an appropriate selection for 
the filter bank. 

One can replace the I, with the sample covariance matrix that is given by 
30 where I M is the matrix formed by concatenating the set of images {I(t)} t =, .... Then, 
the eigenvectors of I, can be computed through the singular value decomposition 
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(SVD) ofI M : 

I M = UEV T 

The eigenvectors of the covariance matrix I, are the columns of the matrix U 
while the elements of the diagonal matrix I refer to the variance of the data in the 
, direction of the basis vectors. Such information can be used to determine the number 
of basis vectors (n) required to retain a certain percentage of the variance in the data. 

Examples of retained eigenvectors are shown in Figure 6. Information related 
with their magnitude and number are given in Figure 7. The image is divided mto 
equal size blocks to reduce complexity. 

Cohering Incremental PCA: The batch method is computationally inefficient 
and cannot be performed at each frame. Therefore, we consider a fast incremental 
method. In this method, the current estimate of the basis vectors is updated based on 
the new observation and the effect of the previous observations is exponentially 
reduced. Several methods for incremental PCA (IPCA) can be considered. See 
,5 Weng, J., Y. Zhang, and W. Hwang: 2003, 'Candid Covariance-free Incremental 

Principal Component Analysis', IEEE Transactions on Pattern Analysis and Machrne 
Intelligence 25(8), 2003; Oja, E. and J. Karhunen: 'On stochastic app—on of 
the eigenvectors and eigenvalues of the expectafion of a random ratinC, Journal of 
MathernaticalAnalysisandApplicat^ Brand, M, 'Incremental 

20 Singular Value Decomposition ofUncertain Data with Missing Values', European 
Conference on Computer Vision. Copenhagen, Denmark, p. I: 707 ff. 2002. A 
method developed by Weng et al. is adapted to suit the present application. See the 
afore-mentioned publication by Weng et al. 

The Amnesic Mean is next considered. Let be the previous m 
25 observations . The mean I m of the images can be computed incrementally : 




Iro-fl 



This computation gives equal we.ght to all of the past observations. In order to reduce 
the effect of previous samples, one can compute the amnesic mean: 
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i-m+1 




where / is called the amnesic parameter that determines the rate of deeay of the 
previous samples. If lis a fixed multiple of . (I - one obtains exponential decay 
5 in the effect of pas. samples. Typical values of x .ha. were used were between 0.01 
and 0.05. 

The Update of the Basis Vectors is next considered. 

Le. (b, , b.) be the current set of estimates of me basis vectors. For reasons 
„ ma. become apparen, la«er, .hese vectors are no. normalted, although .hey are 
Orthogonal. However, ,hey can be nortnahzed for use in Are background model. 
Also le. m be me number of pas. observations. Now, suppose we observe a new 
image W Then, we update the firs, bas,s vector b, by essentially "pulling" ,. m the 
direction of by an amount equal to the projection of W, onto the unit vector 



15 along bi: 



lm+i II 

Here, W isthe projection of I„, inthedirec,io„ofb„and Kh is.h=u„„ 
20 vector in 

the direction of I m +i • 

Next, we compute the residue of Ri of Wi on bj: 

~ W V Pill 2 ) 



34 



This residue is perpendicular to b, and is used to "pull" b 2 in the direction of R, by an 
amount equal to the projection of R, onto the unit vector along b 2 : 



(— ) • b2 + \~ JvFiw; 1 

The residue R2 is calculated similarly: 

R 2 = Rl -ProjbjlR-l) 

= Mint) 152 

This residue is peipendicular to the span of < b,b 2 >. This procedure is repeated for 
each subsequent basis vector such that the basis vector bj is pulled towards I m+ , m a 

0 direction perpendicular to the span of < bi : : : :bj-i >• 

Zhang et al. have proved that, with the above algorithm, b t -> ±U as n -» - . See 
Zhang, Y. and J. Weng: 2001, 'Convergence analysis of complementary candid 
incremental principal component analysis', Technical report, Department of 
Computer Science and Engineering, Michigan State University, 2001. Here, is the 

,5 i-th largest eigenvalue of the covariance matrix I„ and 4 is the corresponding 

eigenvector. Note that the obtained vector has a scale of h and is not a unit vector. 
Therefore, in our application we store these unnormalized vectors. The magnitude 
yields the eigenvalue and normalization yields the eigenvector at any iteration. 
Estimation of the predictive mode is next considered. Considereing the Batch 

20 Update: 

As stated earlier, we will use a linear auto-regressive model to model the 
transformation of states: 



k 



25 for a fc-th order auto-regressive model. The parameters of the model, contained in A, 's, 
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can be recovered if a data se. of prev,oos stare transforms is avarlable. 

We iUosfrare the approach to be foilowen in compoting .he coefficients of the 
marrices At by considenng ,be case of ft = I- Le. - *» « — f ™ state 
vectors /sftjj: 

S2=ls(2),s(3) ! ...,s(t)] 
Sl= ls(lXs(2) > ... s s(t-l)1 

^ prerf (0=St 1 ^^-° = ^^" 1) onthestate 
Now, the application ot 

transformations yrelds an over-consttained se, of iinear e q ua„ons. Th,s se, of 
^nations can men be soived for rhe bes, A in rhe sense of ,eaa, annates error ^ 
1 1 S2 - A Hi I by .he method of nor- a,—- * (See .he above-cited pobhcauon by 
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Doretto et al., 2003): 

A = S2.Sl T (Sl«Sl T y 1 

A closed form solution for the best parameters of the auto-regressive model in 
15 aleastsquaressenseispossibleforany^Thisisachievedbysolvinganover- 

15 ^^^^^^^^^^ 

s(t ) = ^ - 0, where the coefficients of the n x „ square matrices Ai are the 
unknowns. These can be solved using the method of normal equations. 
The Incremental Update is next considered. 

Jcremel methods .„ n P da.e me A matiix are oons.dered ,„ address dns — . 
Wi.hon. boss of generality and for clanty and prese„,.,o„ panose, we cons. *, - 
anto-regressive mode! of order , . Ex.ens.on to h.gber order mode* - tnvaaHy done 
in a manner similar to the Batch method. 

Lat ns consider dre first row ai of A. This row vector yields .he Best 
component prediced s,a.e veCorby a linear combrnarion of.be pteviooa state, 
*. element of me row vecror a, bentg me nnknown coefficients. When we ave 
mul tip,e M transformations ava.lable from prevrons observations, .he problem 
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becomes one of incrementally solving an over-constrained set of linear equatxons 

ai •«(*) = + = 1 --- T 

where s,(t) is the first component of the state s(t) at time t. 

A traditional approach to an iterative solution of a system of linear equations 

5 is through Jacobi iterations or the more commonly used variant - Gauss-Seidel 
iterations See Golub, G. H. and C. V. Loan: 1996, Matrix Computations. John 
Hopkins Universe Press, 1996; and Burden, R. L. and J. D. Faires: Numerical 
Analysis, Prindle, Weber and Schmidt Press, Boston, 1985. The idea is to update 
one unknown variable at a time by rewriting the linear set of equations. Suppose we 

10 are trying to solve the 3-by-3 system Ax = b. This can be rewritten as: 



xi = (6i - anx2 - oi3Xs)/«n 

X 2 = (h - 021*1 ~ «23^3)/«22 
m = (63 - m\X\ ~ a32X2)/«33 



Suppose & is an approximation to x = A''b. A natural way to generate a new 
15 approximation x^ 0 is to compute 

x | fe+1 > = (62 - a 2 ix[ k) - cm4 ] )/<™ 

4 fc+1 > = (63 - anxP " «324 fc) )/ a 33 
This defines the Jacobi iteration for n = 3. For general n we have 

20 

for i = 1 : n 
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end 

The Jacobi iteration does not use the most recently available information when 
computing x^ 0 . The procedure could be revised to always use the most current 
estimate of leading to the Gauss-Seidel iterations. See the above-cited publication 
by Burden and Faires, 1985. 

Within the traditional approach, n equations available. Thus, the coefficient 
matrix (A in the system of equations Ax = b) is unchanged. The main reason for using 
this method, as opposed to the direct method that utilizes matrix inversion, is 
efficiency when solving very large systems, especially when they are sparse. We will 
modify this method to make it applicable for iterative update in a system of over- 
constrained linear equations. Specifically, using Gauss-Siedel iterations, the ith 
iponent of a, is updated at the (k + 7>th step as follows: 



comp 




15 We can rewrite this equation as: 



ft+l) _ C M + !i (7) 



where 



3=1 i=i+l 



(k) 



is the residual error. The second term in Equation 7 captures the effect of a new 
observation. In order to provide robustness to outliers and achieve exponential decay 
in the effect of past observations, one can modify this update equation as follows: 
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This is similar .0 .he so-called relaxaOon methoO, although the for a fixed coefficient 
matrix, it is used for better convergence. 

Choosing <a< 1, one obtains an exponential decay in me effect of pas, observations. 
Since only one component of the a vector can be updated a. a time, me new 
observations are used ,0 update one component a, a rime. Another update w,.h the 
same equation would give exactly the same result The component to be updated ts 
rotated among the components of a. 

A simple mechanism to perform detection is by comparing the prediction with 
the actual observation. Under the assumption that the auto-regressive model was bunt 
using background samples, such technique will prov.de poor prediction for objects 
while being able to capture the background. Thus, based on the subspace model for 
the background, we fust predict the new state according to the previous state, The 
detection task is considered in a 2-dimensional space, 



15 <n;r 2 >. 



The firs, component of such space, name.y r„ is a fitness measure of me projection of 
the current vector onto the subspace, given by: 

s(t)« 



nit) 



PWI 



Such measure repmsents the percentage of the original vector captured by .he state 
» vector. Low values of such measure have a simple interpretation: me original vector 
does not lie in me subspace, and may cottespond .0 a new objeo. in .he scene. In o.her 
words, .his is a measure of change of the scene stntcture. Such case can occur ether 
because of changes in the appearance of .he scene (color), or because of structura 
change, Such technique can better detect objects .0 some extent than the standard 
25 background subtraction techniques that are pixel-based. 

This measure can deal efficiently with changes of appearance in the structural 
sense bu, would fail to capture changes in the temporal domain. This can occur when 
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temporal information appears in a different order than the one for the background. To 
this end, one can consider the relative difference between the current state and the one 
predicted by the auto-regressive model: 

, A ||wgh_sg)! 

r2{t) = mm 

that is normalized by the magnitude of the state vector. Then, decisions on the 
detection of an event are adaptive to the scale of the state space (input data). 
Significant values of this error metric indicate the presence of a new object in the 
scene. Furthermore, such measure captures the change in the motion characteristics. 
Objects following motion trajectories different than the ones being considered (auto- 
regressive model) will reflect to important values for r 2 . Such metric is an additional 
cue for detection based on structural motion that has not been considered in traditional 
background adaptation methods (Crimson et at, 1998; Elgammal et al., 2000). 

The two measures introduced above can be considered within a detection 
mechanism in various forms. Thresholding is the simplest one, where the user 
determines the degree of sensitivity for the system through the threshold values. The 
selection of such threshold values is a difficult task due to their dependence on the 
observation set. In order to adapt the thresholding to the data, one can calculate the 
mean and variance of r, and r 2 and use them to determine the thresholds in an 
dynamic fashion. 

An example of this detection mechanism is shown in Figure 9. In order to 
represent the two-dimensional feature space, a color representation was considered 
where green corresponds to r, and red to r 2 . In each block, the length of the color 
vectors correspond to the magnitude of the detection measures while detection is 
represented by white color. 
5 In the interest of efficiency and robustness, it is recognized that it is preferable 

that this comparison of the prediction with the actual observation be performed in the 
state space and that the statistics of the training data be taken into consideration in the 
comparison. 

Two types of changes in the signal may be considered for detection: (1) 
,0 "structural" change in the appearance of pixel intensities in a given region, and (2) 
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change in the motion characteristics of the signal. Measures are developed in order to 
detect each of these types of changes. 

In order to develop the approach for estimating structural change in the signal, 
we begin by reviewing some concepts in Principal Component Analysis and its 
relationship to density estimation in a multi-dimensional space. Recall that I is a d 
dimensional vector representation of the mean subtracted input I, such that only n 
principal components are retained in the subspace. The principal component analysis 
decomposes the vector space R d into two mutually exclusive and complementary 
subspaces: the principal subspace (or 

feature space) F = ^*= l containing the principal components and its orthogonal 

complement F = Then, using the definition s = B T • I, B being the matrix 

of basis vectors, the residual reconstruction error for an input vector I(t) is defined as: 



i (t) E a - pip ~ X>* 



See K. Fukunaga: Introduction to Statistical Pattern Recognition, Academic Press, 
Boston, second edition, 1990. 

This can be easily computed from the first n principal components and the L 2 - 
norm of the mean-normalized image I. Thus, the L 2 norm of every element x e R can 
be decomposed in terms of its projection in these two subspaces. The component in 
the orthogonal subspace F, referred to as the "distance-from-feature-space" (DFFS), 
is a simple Euclidean distance equivalent to e 2 (x). See B. Moghaddam and A.P. 
Pentland: Probabilistic visual learning for object detection, ICCV, pages 786-793, 
Boston, MA, 1995; and B. Moghaddam and A.P. Pentland: Probabilistic visual 
learning for object representation, PAMI, 19(7):696-710, July 1997. 

Let us assume a Gaussian model for the density in high-dimensional space. 
More complicated models for the density, like mixture-of-Gaussians, or non- 
parametric approaches can also be considered and easily integrated by explicitly 
building a background model on the state space. However, for simplicity and ease of 
use, we will restrict ourselves to Gaussian densities in this paper. If we assume that 
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the mean I and covariance I of the distribution has been estimated robustly, the 
likelihood of an input I to belong to the background class Q is given by: 



5 

The sufficient statistic for characterizing this likelihood is the Mahalanobis distance: 

d(I) = FE- 1 J 

where 1 = 1-1. Utilizing the eigenvalue decomposition of X: Z: = BAB 7 where B is 
the eigenvector matrix of £ and is equal to the matrix of basis vectors B used earlier 
10 and A is the diagonal matrix of eigenvalues (= D 2 in Equation 3), the Mahalanobis 
distance can be written as: 

d{l) = FE- 1 ! 

= F[BA 1 B T JI 
= s T A l s 

since B r I = s. Due to the diagonal form of A, we can rewrite this equation as: 




15 where X\ is the i-th eigenvalue. If we seek to estimate d(I) using only the n principal 
projections, one can formulate an optimum estimator for d(l) as follows: 



n 2 
sf 



1 



E s 

J=n+1 



n 2 



1 



E| + V(i) 



(9) 
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where e 2 (I) is the DFFS defined above and can be computed using the first n 
principal components. In the above-cited publication by Moghaddam et. al. it is 



shown that an optimal p in terms of a suitable error measure based on the Kullback- 
5 Leibler divergence is: 



See M. Cover and J. A. Thomas: Elements of Information Theory, John Wiley and 
Sons, New York, 1994. 

It is herein proposed that d(l) be used as the first detection measure r h Such measure 
10 can be calculated by utilizing only the first n principal components. It is an optimum 
measure for estimating the distance from the Gaussian density represented by the 
principal component analysis such that the covariances of the data are properly taken 
into account while estimating the difference. High values of such distance measure 
have a simple interpretation: the original vector is not close to the training data, and 
15 may correspond to a new object in the scene. In other words, this is a measure of 
change of the scene structure. Such case can occur either because of changes in the 
appearance of the scene (color), or because of structural changes. Therefore, such 
technique can better detect objects than the standard background subtraction 
techniques that consider each pixel individually without consideration of the 
20 relationships between them. 

Change in Motion Characteristics is next considered. The measure rj can deal 
efficiently with changes of appearance in the structural sense but would fail to capture 
changes in the temporal domain. This can occur when temporal information appears 
in a different order than the one for the background. To this end, one can consider the 
25 SSD (Sum of squared differences) error between the input and predicted image, which 
can be expressed as the square of the L 2 norm of the difference between the vectorized 

input and predicted images: ^ " lpr «^ T Since any vector I may be written in 
terms of its components along the basis vectors, "~ ^ i=1 S% *' we may write: 
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Therefore, the norm of this vector may be computed thus: 



|i-W!l§Hl£(*-^) B *i 



2 



=$>- s r d ) 2 + Ew 

since the prediction is made from only the first * components, and therefore j/*** = ft i 
+ L..d. Recalling the definition of e 2 (I), we obtain: 



n-wi}=i>-«r'> ,+ ^ l > 

i=l 



Again, this quantity may be computed from only the first n components. Since the 
effect of the second term has already been captured in r h we define 



r 2 (f)=E{s i -sr d ) 2 =IN~« Pr€<1 ili 
1=1 



where the state vectors are considered only up to the n principal components. 
Significant values of this error metric indicate the presence of a new object in the 
scene. Furthermore, such measure captures the change in the motion characteristics. 
Objects following motion trajectories different than the ones being considered (auto- 
regressive model) will reflect to important values for r 2 . Such metric is an additional 
for detection based on structural motion that has not been considered in traditional 
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background adaptation methods. See W.E.L. Crimson, C. Stauffer, R. Romano, and 
L. Lee: Using adaptive tracking to classify and monitor activities in a site, CVPR, 
Santa Barbara, CA, June 1998; and the above-cited paper by A. Elgammal et al., 
2000. 

Implementation and experiments are next considered. 

Real-time processing is a standard requirement of video surveillance. In 
particular, when dealing with techniques that are aimed at background adaptation, 
such a requirement is strictly enforced. Changes of the background structure should 
be captured from the model to preserve satisfactory detection rate. 

Computing the basis components for large vectors is a time consuming 
operation. Optimal algorithms for singular value decomposition of an m x n matrix 
take 0(mn + n 3 )) time. See G. H. Golub and C.F. Van Loan: Matrix Computations, 
John Hopkins University Press, 1996. 

A simple way to deal with such complexity is by considering the process at a 
block level. To this end, we divide the image into blocks and run the algorithm 
independently on each block. For each of these blocks, the number of components 
retained is determined dynamically by the singular values (which refer to the standard 
deviation in the direction of basis vectors). Also determined by the singular values is 
the number of past images over which the SVD is computed (for the non-incremental 
method). This is because higher variation in a region suggests that more images would 
be required to model it. Furthermore, we compute the PCA only over those frames 
that are not well modeled by the current basis vectors. This enables us to capture the 
variation of the data over a much longer time window with the same computational 
cost. 

Such mechanism leads to a quasi real-time (~5fps) implementation for a 340 x 
240 3-band video stream on a 2.2 GHz Pentium IV processor machine. 

In order to validate the proposed technique, experiments were conducted on 
two different types of scenes. First, results are shown on the challenging scene of the 
ocean front. Such a scene involves wave motion, blowing grass, long-term changes 
due to tides, global illumination changes, shadows, etc. An assessment on the 
performance of the existing methods is shown in Figure 2. See the afore-cited 
publications by C. Stauffer et al. and A. Elgammal et al, 2000. Even though these 
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techniques were able to cope to some extent with the appearance change of the scene, 
their performance can be considered unsatisfactory for video based surveillance 
systems. 

The detection of events was either associated with a non-acceptable false 
5 alarm rate or the detection was compromised when focus was given to reducing the 
false alarm rate. The algorithm in accordance with the invention was able to detect 
events of interest in the land and simulated events on the ocean front as shown in 
Figure 10. Note that the algorithm is largely able to handle waving trees 
automatically without any parameter adjustments for different blocks, with the 
10 exception of very large movements, as in the fourth image, 

The essence of the approach is depicted in Figures 9 and 10. Observation as 
well as prediction are presented for comparison. Visually, one can conclude that the 
prediction is rather close to the actual observation for the background component. On 
the other hand, prediction quality deteriorates when non-background structures appear 
15 in the scene. A more elaborate technique to validate prediction is through the 

detection process as shown in Figure 10 which shows results from a sequence of a 
road with waving trees, wherein are shown: left: input signal, middle: predicted signal, 
right: block-wise response of the detection measures, wherein green represents T\ , 
pink represents r 2 , and white represents detection in a block. 

20 . A Quantitative evaluation of the method can be considered through the ROC 
(Receiver-Operator Characteristic) curves. Figure 13 illustrates the ROC 
characteristics of our method for the sequence of the ocean front (Figure 1). In Figure 
13, the receiver-operator characteristic (ROC) curves are shown for a sequence for 
(i) mixture-of-Gaussians model, (ii) non-parametric kernels, and (iii) an embodiment 

25 in accordance with the present invention. 

Also shown for comparison purposes are the ROC curves for the existing 
techniques in Figure 14, which shows a receiver-operator characteristic (ROC) curves 
for another sequence for (i) mixture-of-Gaussians model, (ii) non-parametric kernels, 
and (iii) an embodiment in accordance with the present invention. 

30 As can be seen from the curves, there was a substantial improvement in the 

results as compared to existing methods. Most of this improvement was observed in 
the region of the ocean front and the blowing grass; the improvement in the static 
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parts of the scene, although quite significant, were not as marked and the performance 
of all three methods can be considered satisfactory in these regions. 

The second scene we consider is a typical traffic monitoring scene where the 
trees were blowing due to the wind (Figure 13). Results comparable to existing 
methods were obtained for the static parts of the scene (like the road). At the same 
time, false detections in the tree area was significantly reduced as compared to 
traditional methods. This was achieved without any manual parameter adjustments 
and objects even behind the trees (but visible in spots through the holes in the 
structure) were detected in some cases. These objects would have been impossible to 
detect with traditional methods that typically have to rely on pixel neighborhood 
analysis to remove outliers and thus lack the ability to correlate far away pixels. The 
ROC curve for this sequence is shown in Figure 11. 

A prediction-based on-line method for the modeling of dynamic scenes has 
been disclosed. A novel aspect of the present invention is the integration of a 
powerful set of filter operators within a linear prediction model towards the detection 
of events in a dynamic scene. Furthermore, on-line adaptation techniques have been 
disclosed to maintain the selection of the best filter operators and the prediction model 
and provably optimum detection measures have been developed that are adaptive to 
the complexity of the scene. 

The approach has been tested and validated using a challenging setting, 
including the detection of events on the coast line and the ocean front as shown in 
Figures 10 and 12. Large scale experiments that involved real events (Figures 10 and 
12) as well as simulated ones were conducted (Figure 9). Figure 12 shows results 
from another sequence, wherein in the later stages, a person is detected in spite of 
hiding behind a fence. The technique in accordance with an aspect of the present 
invention was able to detect such events with a minimal false alarm rate. Detection 
performance was a function of the complexity of the observed scene. High variation 
in the observation space reflected to a mechanism with limited discrimination power. 
The method was able to adapt with global and local illumination changes, weather 
changes, changes of the natural scene, etc. Validation has been performed by 
comparing the technique in accordance with the present invention with the state-of- 
the-art methods in background adaptation (Figures 2 and 14). The method in 
accordance with an aspect of the invention could meet and surpass in some cases the 
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performance of these techniques for stationary scenes, while being able to deal with 
more complex and evolving natural scenes. 

It will be understood that the present method operates optionally with any or 
all of an imaging device such as an electronic camera, a programmable computer, and 
5 an image viewing device or screen, and other apparatus known in the art. It will be 
further understood by one of ordinary skill in the art to which it pertains that, while 
the invention has been described by way of preferred embodiments, various changes, 
alterations, and substitutions may be made without departing from the spirit of the 
invention which is defined by the claims following. 
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